Skip to content

vectorscale: pack-positions pre-pass + geometric crossing intersection#909

Draft
northbymidwest wants to merge 1 commit intolibretro:masterfrom
northbymidwest:vectorscale-pack-positions
Draft

vectorscale: pack-positions pre-pass + geometric crossing intersection#909
northbymidwest wants to merge 1 commit intolibretro:masterfrom
northbymidwest:vectorscale-pack-positions

Conversation

@northbymidwest
Copy link
Copy Markdown

@northbymidwest northbymidwest commented Apr 30, 2026

This PR has two parts: a per-fragment performance optimization (the pack-positions pre-pass) and a correctness improvement for crossings (geometric curve-curve intersection). Both touch only the existing pipeline — no changes to similarity-graph, resolve-crossings, cell-graph, init-positions, or optimize-energy.

Performance — pack-positions pre-pass

New shader inserted between the final update-tjunction iteration and cell-rasterizer. For each CP slot, it packs the full per-CP render geometry into 3 horizontally-adjacent texels of a PackedPositions framebuffer:

(cx*3 + 0, cy_slot) = (pp.x, pp.y, prev_ci_or_-1, _)
(cx*3 + 1, cy_slot) = (cp.x, cp.y, t_branch, validity)   validity: 0=skip, 1=normal, 2=2-CP-line
(cx*3 + 2, cy_slot) = (np.x, np.y, next_ci_or_-1, _)

(pp, cp, np) is already ghost-extended (pp = 2·prev − cp etc.) for endpoint neighbors. t_branch is computed in the right way per CP type (see correctness section below for crossings; closed-form cubic project for one-sided clamped Beziers; 0.5 otherwise).

What this lets the rasterizer's test_one_cp skip per pixel:

  • The 2 neighbor-index fetches from CellGraph rows 1, 2 — neighbors come back inline in B channels of texels 0/2.
  • The 2 follow-up FinalPositions fetches for prev/next.
  • The follow-up flag fetch on the neighbor needed for the IS_ENDPOINT check that drives ghost extension.
  • The in-shader ghost-construction math, 2-CP-chain branching, and t_branch cubic-solver call.

Cost added: 3 fetches from PackedPositions per CP probe instead of 1 from FinalPositions. Net per active probe: ~6 fetches → 4 (1 flag + 3 packed reads). The pack-positions pass itself runs once-per-frame, O(num_cps) work — 3 fragments per CP slot.

Measured ~5–10% frame-time reduction on dense frames (sprites with many active CPs / large viewport scale where the rasterizer is the dominant cost). Sparser frames see less; fewer active CPs means fewer per-pixel probes get past the flag check and the existing 3-sample distance² quick screen, so the rasterizer spends proportionally less of its time in the body that pack-positions actually shortens.

resolve_hit's neighbor flag/dir lookups for color resolution are unchanged — neighbor indices still flow through to it via the packed texels' B channels.

Correctness — respect the optimizer's crossing positions

At a 4-way crossing the rasterizer's wedge-AA junction lines need to anchor at the geometric meeting point of the N-S and E-W B-spline curves. The previous approach didn't compute that meeting point directly: update-tjunction ran a ghost-aware inverse B-spline correction that relocated each crossing CP to the position that would make the rendered curve pass through the grid corner at exactly t=0.5. That overrode the position the optimizer had chosen — the CP got pulled away from its energy-minimum back toward the integer grid corner so the rasterizer's downstream t_branch=0.5 assumption worked out.

This PR keeps the optimizer's crossing position and computes the actual curve-curve intersection inline in pack-positions:

  • Solve F(t, s) = B_a(t) − B_b(s) = 0 by 2D Newton iteration starting from (t, s) = (0.5, 0.5). The optimizer keeps crossings near the grid corner so the initial guess is within ~0.1 of the answer; quadratic convergence gets the residual below f32 epsilon in 3 iterations (4 used for safety). Each step inverts the 2×2 Jacobian analytically; an early break on |det(J)| < 1e-12 handles the tangent / parallel-curves degenerate case (doesn't fire in practice).
  • Slot 0 of the crossing pair gets t_a (parameter on the N-S curve), slot 1 gets t_b (E-W). The CP itself stays at the optimizer's final position — no relocation.
  • The rasterizer reads t_branch straight from PackedPositions and uses it as the wedge-AA junction parameter, so J = beval(curve, t_branch) lands on the geometric intersection at whatever t the two curves actually cross.

update-tjunction.slang loses the IS_CROSSING branch entirely; only T-junction stem snap remains. The Opt2 sampler / read_orig_pos helper / Opt2Size UBO field are gone (no longer read).

Pipeline diff

11 passes (was 10):

similarity-graph
resolve-crossings
cell-graph
init-positions
optimize-energy ×2
update-tjunction ×3
+ pack-positions      ← new
cell-rasterizer

vectorscale.slangp updated accordingly.

Verification

  • All 8 shaders compile cleanly to SPIR-V via glslangValidator.
  • Visual smoke test in RetroArch (Vulkan backend) on Game Boy and pixel-art content shows no regressions vs the previous version; crossing CPs now sit where the optimizer wanted them.

Co-authored-with @anthropic-ai/claude-code.

Adds a per-CP pre-pass (pack-positions) that denormalizes render
geometry into a single PackedPositions texture and folds the crossing
curve-curve intersection into the same pass. The rasterizer reads its
full per-CP geometry from PackedPositions and skips ghost extension,
neighbor-index decoding, and t_branch solving in its hot loop.

New shader: pack-positions.slang
For each CP slot, packs into 3 horizontally-adjacent texels:
  col 0 = (pp.x, pp.y, prev_ci_or_-1, _)
  col 1 = (cp.x, cp.y, t_branch, validity 0=skip 1=normal 2=line)
  col 2 = (np.x, np.y, next_ci_or_-1, _)

(pp, cp, np) is the ghost-extended (pp = 2·prev - cp etc.) Bezier
control triple. t_branch is computed per CP type:

- IS_CROSSING: 2D Newton iteration on F(t,s) = B_a(t) - B_b(s) = 0,
  starting from (0.5, 0.5). The optimizer keeps crossings near the
  grid corner so the initial guess is within ~0.1 of the answer;
  4 iterations drive the residual below f32 epsilon. Reads neighbor
  positions from both this slot's chain (N-S or E-W) and the partner
  slot's chain.

  This replaces the legacy ghost-aware inverse-correction that moved
  each crossing CP so the rendered curve passed through the grid
  corner at t=0.5. The CP now stays at its optimizer-final position
  and the rasterizer's wedge AA anchors at the geometric intersection
  B_a(t) = B_b(s).

- 2-CP chain (degenerate stem with both ends as endpoint markers):
  t_branch = 0.5; render geometry pre-built as a straight line so the
  rasterizer dispatches to its closed-form line solver via is_line.

- One-sided clamped Bezier (prev or next is endpoint): closed-form
  cubic project of the interior B-spline midpoint onto the clamped
  span — finds the t at which the rendered clamped curve reaches the
  same physical "before/after sc" boundary an interior B-spline would
  at t=0.5.

- Else: t_branch = 0.5.

Modified: update-tjunction.slang
Drop the IS_CROSSING ghost-aware inverse-correction branch; crossings
pass through unchanged. Drops the now-unused Opt2 sampler binding,
read_orig_pos helper, and Opt2Size UBO field.

Modified: cell-rasterizer.slang
Replace read_pos + read_neighbors + ghost extension + 2-CP-chain
construction + t_branch cubic-solver in test_one_cp with a single
read_packed_cp(ci) call returning a PackedCp struct. Per-active-probe
fetch count: ~6 → 4 (1 flag + 3 packed reads). resolve_hit's
neighbor-direction lookups for color resolution are unchanged.

Modified: vectorscale.slangp
11 passes (was 10). pack-positions inserted between the final
update-tjunction iteration (FinalPositions) and cell-rasterizer.
PackedPositions framebuffer is 3.0 × source-relative wide.
@northbymidwest northbymidwest marked this pull request as draft April 30, 2026 19:59
@northbymidwest northbymidwest marked this pull request as ready for review April 30, 2026 20:28
@northbymidwest northbymidwest marked this pull request as draft April 30, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant